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Abstract. An iterative method LSMR is presented for solving linear systems Ax = b and least- 
squares problems min \\Ax — b\\2, with A being sparse or a fast linear operator. LSMR is based on the 
Golub-Kahan bidiagonalization process. It is analytically equivalent to the MINRES method applied 
to the normal equation A^Ax = A'^b, so that the quantities ||A^rj.|| are monotonically decreasing 
(where = b — Axk is the residual for the current iterate x^). We observe in practice that also 
decreases monotonically, so that compared to LSQR (for which only is monotonic) it is safer to 
terminate LSMR early. We also report some experiments with reorthogonalization. 
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1. Introduction. We present a numerical method called LSMR for computing 
a solution x to the following problems: 

minimize 
minimize 



Unsymmetric equations: 
Linear least squares (LS): 



Regularized least squares: minimize 



x\\2 subject to Ax = b 
Ax~b\\2 



where A G 



be 



and A > 0, with m < n or m > n. The matrix A is used as 
an operator for which products of the form Av and A^u can be computed for various 
V and u. (If A is symmetric or Hermitian and A = 0, MINRES-QLP jj] is applicable.) 

LSMR is similar in style to the well known method LSQR [El [17] in being based 
on the Golub-Kahan bidiagonalization of A [6]. LSQR is equivalent to the conjugate- 
gradient (CG) method applied to the normal equation {A^A + X^I)x = A^b. It has the 
property of reducing |jrfc|j monotonically, where rk — h — Ax}^ is the residual for the 
approximate solution Xk- (For simplicity, we are letting A = 0.) In contrast, LSMR 
is equivalent to MINRES [T5| applied to the normal equation, so that the quantities 
IjA^rfcll are monotonically decreasing. In practice we observe that |jrfc|| also decreases 
monotonically, and is never very far behind the corresponding value for LSQR. Hence, 
although LSQR and LSMR ultimately converge to similar points, it is safer to use 
LSMR in situations where the solver must be terminated early. 

Stopping conditions are typically based on backward error: the norm of some per- 
turbation to A for which the current iterate x^ solves the perturbed problem exactly. 
Experiments on many sparse LS test problems show that for LSMR, a certain cheaply 
computable backward error for each xj. is close to the optimal (smallest possible) 
backward error. This is an unexpected but highly desirable advantage. 
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1.1. Overview. Section 2 introduces the Golub-Kahan process and derives the 
basic LSMR algorithm with A = 0. Section 3 derives various norms and stopping 
criteria. Section 4 discusses singular systems and complexity. Section 5 derives the 
LSMR algorithm with A > 0. Section 6 describes backward error estimates. Section 7 
gives numerical results on a range of overdetermined and square systems. Section 8 
summarizes our findings, and Appendix A proves one of the main lemmas. 

1.2. Notation. Matrices are denoted by A,B,..., vectors by v,w,..., and 
scalars by a, /3, . . . . Two exceptions are c and s, which denote the significant com- 
ponents of a plane rotation matrix, with + = 1. For a vector v, \\v\\ always 
denotes the 2-norm of v. For a matrix A, \\A\\ usually denotes the Frobenius norm, 
and the condition number of a matrix A is defined by cond(A) = ||^||||A+||, where 

denotes the pseudoinverse of A. Vectors ei and denote columns of an identity 
matrix. Items like l3k and /3fc are about to change to something similar like f3k- 

2. Derivation of LSMR. We begin with the Golub-Kahan process [B], an it- 
erative procedure for transforming (b A) to upper-bidiagonal form (/3iei B^). 

2.1. The Golub-Kahan process. 

1. Set /3iUi = b (shorthand for /3i = ui = h/ Pi) and aiVi = A^ui. 

2. For fc = 1,2, . . . , set 



Pk+iUk+i = Avk - akUk and ak+iVk+i = A Uk+i - Pk+iVk- 



(2.1) 



After k steps, we have 

AVk = 

where we define Vk — {vi 



Uk+iBk and A^Uk+i = Vfe+iL^+i, 



V2 



P2 



Vk), Uk = {Ul U2 
\ 



Uk), and 



Q!2 



Bk — 

Now consider 

A^AVk 



Pk 



ak 



Lk+i — {Bk ak+iCk+i) ■ 



A^Uk+iBk — Vk+iL\j^iBk 



Vk+i{ % 



Bk 



= Vk 



+1 



ak+iPk+ie,, 



This is equivalent to what would be generated by the symmetric Lanczos process with 
matrix A^A and starting vector A^b. (For this reason we define pk = ctkPk below.) 

2.2. Using Golub-Kahan to solve the normal equation. Krylov subspace 
methods for solving linear equations form solution estimates Xk = VkVk for some j/fc, 
where the columns of Vk are an expanding set of theoretically independent vectors. 
(In this case, Vk and also Uk are theoretically orthonormal.) 

For the equation A^Ax = A^, any solution x has the property of minimizing ||r|j, 
where r ^ b ~ Ax is the corresponding residual vector. Thus, in the development of 
LSQR it was natural to choose yk to minimize \\rk\\ at each stage. Since 



rk^b- AVkVk = Piui - Uk+iBkVk = C/fe+i(Aei - BkVk), 
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where Uk+i is theoretically orthonormal, the subproblem niinj^^. ||/^iei — BkykW easily 
arose. In contrast, for LSMR we wish to minimize ||A"^rfc||. Let jSj, = akPk for all k. 
Since A^rk — A^b — A'^Axk — — A'^AVkUk, we have 



A Pivi- Vfc+1 
and we are led to the subproblem 



ak+ A+iel^y' 



Vk+i Pie 



I 7/, 

T Vk 



min|j^ Tfel 

Vk 



mm 

Vk 



^k^k , 



(2.2) 



Efficient solution of this LS subproblem is the heart of algorithm LSMR. 

2.3. Two QR factorizations. As in LSQR, we form the QR factorization 

(Pi O2 \ 



Qk+lBk — 



Rk 




Rk = 



P2 



V 



PkJ 



(2.3) 



If we define tk = RkVk and solve R]^qk = Pk+iek, we have qk = {I3k+i/pk)ek = (/'fcCfc 
with pk = {Rk)kk and {pk = (3k+i/pk- Then we perform a second QR factorization 



Q 



k+l 



Rk 

fkel 







Rk _Zk 

Ck+i 



Rk — 



172 
P2 



Combining what we have with (2.2 1 gives 



min||A rk\ 

Vk 



mm 

Vk 



R^Rk 
llRk 



Vk 



mm 

tk 



mm 

tk 



(fikel 



_Zk 

Cfc+i 







Ok 

PkJ 



tk 
tk 



(2.4) 



(2.5) 



The subproblem is solved by choosing tk from Rktk — Zk- 



2.4. Recurrence for Xk- Let Wk and Wk be computed by forward substitution 
from RlW^ = and RlW^ = . Then from Xk = VkVk, RkVk = ifc, and 
Rktk — Zk, vfe have a;o = and 



Xk = WkRkVk = Wktk = WkRktk = WkZk = Xk-i + CkWk 
2.5. Recurrence for Wk and Wk- If we write 

Vk = {Vl V2 ■ ■■ Vk) , Wk = {wi W2 ■ 

Wk = {wi W2 ■■■ Wk) , Zk ^ {Ci C2 ■ ■ ■ Ck) 



Wk) 
T 



an important fact is that when k increases to fc + 1, all quantities remain the same 
except for one additional term. 
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The first QR factorization proceeds as follows. At iteration k we construct a plane 
rotation operating on rows / and I + 1: 



Pi = 



\ 



Cl 

-si 



Sl 
Cl 



\ 



h-i-ij 



Now if Qk+i = Pfe . . . P2-P1, we have 



Qk+lBk+l 



Q 



k+l 



Qk+2Bk 



+1 




/3/C+2 

ak+i = ( 

Pk+2 J 




and we sec that Ok+i = SfcOfc+i = {(3k+i/ Pk)ak+i = Pk+i/pk = Vk- Therefore we can 
write Ok+i instead of ipk- 

For the second QR factorization, if Qfc+i = Pk ■ ■ ■ P2P1 we know that 



Q 



k+l 



( Rl 



Rk 





and so 



Q 



fe+2 



\0k+2ek+i) 



k+l 




Rk 




(2.6) 



V^_^-^ and the last 



By considering the last row of the matrix equation iJ^^^W^^ 
row of R'^_^_iW^_^-^ = W^_^-^ we obtain equations that define Wk+i and Wk+i- 

Ok+iWk + Pk+iWk+i = Wfc+i, 
h+iwl + Pk+iwl+i = w^+i. 

2.6. The two rotations. To summarize, the rotations Pk and Pk have the 
following effects on our computation: 



Cfc 
-Sk 



Sk 
Ck 



oik 

Pk+1 



Ck 
-Sk 



Sk 
Ck 



Ck-lPk 
&k+l 



ak+1 
Ck 



Pk+l 



Pk 



Pk 




dk + 1 

ak+1 

Qk+l 
CkPk+1 



_Ck 
Cfe+1 



2.7. Speeding up forward substitution. The forward substitutions for com- 
puting w and w) can be made more efficient if we define hk = Pk^k and hk = PkPkWk- 
We then obtain the updates described in part 6 of the pseudo-code below. 

2.8. Algorithm LSMR. The following summarizes the main steps of algorithm 
LSMR for solving Ax « 6, excluding the norms and stopping rules developed later. 

1. (Initialize) 



Co 



aiVi 
So 



Ui 



ai 
hi 



0L\ 



Cl 

/lo 







Po 
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2. For fc = 1, 2, 3 . . . , repeat steps 3-6. 

3. (Continue the bidiagonalization) 



(3k+iUk+i = Avk - akUk, 

4. (Construct and apply rotation Pk) 

Pk = (<5fc + Pl+i) ' Cfc = ak/pk 

Sk+i = SkOtk+i a/c+i = Ckak+i 

5. (Construct and apply rotation Pk) 



afc+iWfe+i = A Uk+i - Pk+iVk 

Sk = Pk+l/Pk 



(2.7) 
(2.8) 



pk = ((Cfe„l/9fe)^ 
Sfe — Qk+l/Pk 
Ck+l — —SkCk 



(2.9) 
(2.10) 



ffc = Sk-lPk 
Cfe = Ck-lPk/Pk 

Cfe = cfcCfc 

6. (Update ft,, h x) 

hk^hk- [OkPk/ {Pk-iPk~i))hk~i 

Xk = Xk-l + {Ck/iPkPk))hk 

hk+i = Vk+i - {Ok+i/pk)hk 

3. Norms and stopping rules. Here we derive \\rk\\, \\A'^rk\\, \\xk\\ and esti- 
mates of ll^ll and cond(A) for use within stopping rules. All quantities require 0(1) 
computation at each iteration. 

3.1. Computing ||rfc|| . We transform Kj^ to upper-bidiagonal form using a third 
QR factorization: Rk = QkRk with Qk — Pk-i ■ ■ - Pi- This amounts to one additional 
rotation per iteration. Now let 

Qk 

I, 



tk 



]ktk, 



bk = 



Qk+ieiPi. 



(3.1) 



Then rk = b - Axk = /3iui - AVkVk = C/fe+iei/3i - Uk+iBkUk gives 



Tk = Uk+i ( ei/3i - Qfc+i 



Rk 





Vk ] = Uk+1 ( ei/3i - Qfc+i 



Uk 



+1 



Qk+l 



Uk+iQk+i 



Ql 



1 



bk 



k+l 



Qlh 





Therefore, assuming orthogonality of Uk+i, we have 



\'rk\\ = 



bk- 



The vectors hk and tk can be written in the form 



Tk-1 



Tk)' 



(3.2) 



(3.3) 



Zk- 



The vector tk can be computed by forward substitution from Rj^tk 
Lemma 3.1. In = f, for i = 1, . . . ,k - I. 

Proof. Appendix |A| proves the lemma by induction. □ 

Using this lemma we can estimate ||rfc|| from just the last two elements of bk and 
the last element of tk, as shown in (3.6) below. 
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3.1.1. Pseudo-code for computing \\rk\\- The following summarizes how ||rfc| 
may be obtained from quantities arising from the first and third QR factorizations. 

1. (Initialize) 

i3i=Pi /3o = po = l f-i = ^0 = Co = 

2. For the fcth iteration, repeat steps 3-6. 

3. (Apply rotation Pfc) 



Pk — CkPk Pk+l — —SkPk 

4. (If fc > 2, construct and apply rotation Pk-i) 
Pk-i 



Cfe-l = pk-ilpk- 



fk — Sk~lPk 
Pk-1 = Ck-lPk-l + Sk-lPk 

5. (Update tk by forward substitution) 

Tk-l — (Cfc-1 ^ Qk-lTk-2)/pk- 

6. (Form ||rfe||) 



Pk 



h/p. 



•k~l 
Ck-lPk 
-Sk-lPk- 



(3.4) 



(3.5) 



1 + Ck-lPk 



Tk = (Cfe - 0kfk-l)/Pk 



l={k- Tk? 



i^k+n 



\rk\ 



From (2.5) we have ||^ rfe| 



3.2. Computing ||A^rfc| 
is monotonically decreasing. 

3.3. Computing |ja:;fc||. From section 
the third QR factorization QkR]^ = Rk in section 
QkiQkRkV = Rk we can write 



V7 (3.6) 
|Cfc+i|, which by ([2l0t 



2.4 



we have Xk — VkRj. Rj. Zk- From 
and a fourth QR factorization 



3.1 



Xk 



'^RkQk^k — VkRk ^QkQkRkOkZk — VkQ^ 

where Zk and Zk are defined by forward substitutions R^^ 



k ^k, 

Zk and Rlzk = h- 

Assuming orthogonality of Vk we arrive at the estimate = \\zk\\. Since only 

the last diagonal of Rk and the bottom 2x2 part of Rk change each iteration, this 
estimate of ||a;fc|| can again be updated cheaply. The pseudo-code, omitted here, 
can be derived as in section |3.1.1| Experimentally we have observed that for every 
iteration, > is cither true or very nearly true. 

3.4. Estimates of and cond(A). It is known that the singular values of 
Bk are interlaced by those of A and are bounded above and below by the largest 
and smallest nonzero singular values of A [16]. Therefore we can estimate \\A\\ and 
cond(A) by |ji?fc|| and cond(i?fc) respectively. Considering the Frobenius norm of Bk, 
we have the recurrence relation 



- B. 



(2.6), we can show that the following QLP factorization 



l^^. From (12^-(123 and 
D holds: 



Qk+iBkQj, 



^k-i 



Ck-lPkj 



(the same as R[ except for the last diagonal). Since the singular values of Bk are 
approximated by the diagonal elements of that lower-bidiagonal matrix j23j , and since 
the diagonals are all positive, we can estimate cond(A) by the ratio of the largest and 
smallest values in {pi, . . . , pk-i, Ck-iPk}- Those values can be updated cheaply. 
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3.5. Stopping criteria. With exact arithmetic, the Golub-Kahan process ter- 
minates when either a^+i = or — 0. For certain data b, this could happen in 
practice when k is smaU (but is unhkely later) . We show that LSMR will have solved 
the problem at that point and should therefore terminate. 

When Qffc+i — 0, with the expression of H^-'^rfcll from section 3.2 we have 



l^^'-fell = ICfe+il = IsfcCfcl = l^'fe+iPfc^Cfel = \skak+iPk^Ck\ =0, 



where (2.10), (2.91, ^2.81 are used. Thus, a least-squares solution has been obtained. 



When Pk+i — 0, we have 

Sk = Pk+iPk^ = 0. 
Pk+i = -SkPk = 0. 



Pk = £k^ (/3fe - Sk{~l)''s^''^Ck+if3i) 



= Ck'Pk 
= Pk^PkPk 
= Pk^PkTk 
= Tk- 



(from (2.7)) 



(from (3.41, (3.7)) 



(from (A.6)) 



(from (3.7)) 



(from (3.5)) 



(from Lemma 3.11 



(from (A.2), (A.3)) 



(3.7) 
(3.8) 



(3.9) 



By (3.9), (3.81, and (3.6) we conclude that \\rk\\ = 0. It follows that Axk 



3.6. Practical stopping criteria. For LSMR we use the same stopping rules 
as LSQR [1^, involving dimcnsionless quantities ATOL, BTOL, CONLIM: 



SI 
S2 
S3 



Stop if llrfell < BT0L||6|| + ATOL|| A|l 
Stop if P^rfell < ATOL||A||||rfe|| 



Stop if cond(A) > CONLIM 

SI applies to consistent systems, allowing for uncertainty in A and 6 [101 Theorem 
7.1]. S2 applies to inconsistent systems and comes from Stewart's backward error 
estimate ||i?2|| assuming uncertainty in A\ see section 6.1 S3 applies to any system. 



4. Characteristics of the solution on singular systems. If A does not have 
full column rank, the normal equation A^Ax = A'^b is singular but consistent. We 
show that LSQR and LSMR both give the minimum-norm LS solution. That is, they 
both solve the optimization problem min ||a;||2 such that A^Ax = A^. Let N(A) and 
R(^) denote the nuUspace and range of a matrix A. 

Lemma 4.1. If Ae M™^" and p e E" satisfy A^Ap = 0, then p e N{A). 

Proof A^Ap = ^ p^A^Ap = ^ {ApYAp ^ Ap = 0. □ 

Theorem 4.2. LSQR returns the minimum-norm solution. 

Proof. The final LSQR solution satisfies A^Ax]^'^^ = A^b, and any other solution 
X satisfies A^Ax — A^. With p — x 
equations gives A'^Ap = 0, so that Ap = by Lemma 
ak+iVk+i = A'^Uk+i - Pk+iVk ( prj ), we have vi 
implies p^Vk — 0, so that 



^fc^'^^' difference between the two normal 



4.1 



,Vk 



From aivi = A^ui and 
'gR(A^). With Ap = 0, this 



ml 



LSQR||2 
^k II2 



X 



LSQR 



-P\\l 



LSQR|i2 
'^k II2 



T 

p p- 



o T LSQR 
2p X^'^ 



p^p + 2p^Vky]:'^''' 



p^p > 0. 



□ 
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Corollary 4.3. LSMR returns the minimum-norm solution. 

Proof. At convergence, ak+i = or 13^+1 — 0. Thus /S^+i — afe+i/?fe+i — 0, which 
means equation (2.2 ) becomes min |j/3iei — Bj^BkUkW scad hence B^B^yk = PiSi, since 
Bk has fuU rank. This is the normal equation for minWBkyk — /3iei||, the same LS 
subproblem solved by LSQR. We conclude that at convergence, yk = y]l^'^^ and thus 
Xk = Vkyk = Vkyl^^^ = xl^^^, and Theorem [Z2| applies. □ 

4.1. Complexity. We compare the storage requirement and computational com- 
plexity for LSMR and L SQR on Ax « b and MINRES on the normal equation 
A'^Ax = A^. 



In Table 



4.1 



we list the vector storage needed (excluding storage 
for A and b). Recall that ^ is to x n and for LS systems m may be considerably larger 
than n. Av denotes the working storage for matrix-vector products. Work represents 
the number of floating-point multiplications required at each iteration. 

Table 4.1 

Storage and computational requirements for various least-squares methods 





Stora^ 


;e 






Work 




TO 


n 






TO 


n 


LSMR 


Av, u 


X 


V, 


h, h 


3 


6 


LSQR 


Av, u 


X 




w 


3 


5 


MINRES on A^Ax = A^ 


Av 


X 




,V2,Wi,W2,W3 
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5. Regularized least squares. In this section, we extend LSMR to the regu- 
larized LS problem: 



(5.1) 



If A = ( ^ j and = - Axk, then 



A' rk =A'rk- X'xk = Vk+i l^Piei 
= Vk+i (piei 



RlRk 



2 (yk 





and the rest of the main algorithm follows the same as in the unregularized case. In 
the last equality, Rk is defined by the QR factorization 



'Bk 

l2k+l I 



Rk 




Q2k+l = PkPk ■ ■ ■ P2P2P1P1, 



where Pi is a rotation operating on rows I and Z + fc + 1. The effects of Pi and Pi are 
illustrated here: 



Pi 



/ ai 


\ 


P2 








X 




\ 


V 



P2 012 



V V 



Pi 



(oci \ 

P2 0:2 



(Pl 02\ 

a2 

V A/ 
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5.1. Effects on ||rfc||. The introduction of regularization changes the residual 
norm as follows: 



Xk = 



Pi 



AVk\ 
XVkJ 



Vk = 



fUk+iBk 
\ XVk 



Vk 



Uk+1 
Uk+1 
Uk+1 
Uk+1 



Vk 
Vk 
Vk 
Vk 



eiPi 
eiPi 
eiPi 

Q2k+1 



Bk 
XI 



Vk 



Rk 


tk 




Vk 



with hk = 



Qk 



Q2fc+iei/3i, where we adopt the notation 



bk = (/3i 



I3k Pk+i Pi 



We conclude that = /3i H V $1 + {Pk~ TkY + jSl^^. The effect of regularization 

on the rotations is summarized as 



Cfc Sk 
-Sk Cfe 



ctk 13 k 
A 



Cfc Sk 
-Sk Ck 



Oik Pk 
Pk+1 Oik + l 



Oik Pk 

Pk 

Pk Ok+1 Pk 

Oik+l Pk+1 



5.2. Pseudo-code for regularized LSMR. The following summarizes algo- 
rithm LSMR for solving the regularized problem (5.1 1 with given A. Our Matlab 
implementation is based on these steps. 
1. (Initialize) 



PiUi = b 


aivi 


= A^ui 


Oil 


= Oil 


Ci 


= aiPi 


Po 


= 1 


Po 


= 1 


Co = 1 


So 


= 


Pi 


= Pi 


Po 


= 


Po 


= 1 


T-l 


= 


00 = 


Co 


= 


da 


= 


hi 


= Vl 


ho 


= 


xo 


= 



2. For fc = 1, 2, 3, . . . repeat steps 3-12. 

3. (Continue the bidiagonalization) 

Pk+iUk+i = Avk - akUk 

4. (Construct rotation Pk) 

dk = {al + A2) ^ 

5. (Construct and apply rotation Pk) 



afe+i"fe+i = ■^'u.k+i - Pk+iVk 



Ck = ak/oik Sk = A/dfc 



Pk = i&l + Pl+i) 

^k + l = SkCtk+l 



Ck = OikI Pk 

Oik+l = Ckak+1 



Sk = Pk+i/pk 
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6. (Construct and apply rotation Pk 



Ok = Sk-iPk Pk = {{ck-iPkY + ^fc+i) ^ 

Cfe = Ck^ipk/pk Sk = Ok+i/pk 

Cfe = CfcCfe Cfe+i = —SkCk 

7. (Update h, x, h) 

hk = hk~ {OkPk/{Pk~iPk~i))hk-i 
Xk = Xk-i + {Ck/{pkPk))hk 
hk+i — Vk+i ~ {Ok+i/pk)hk 

8. (Apply rotation Pk,Pk) 

Pk = C-kh $k = -SkPk Pk = CkPk Pk+l = -SkPk 

9. (If k > 2, construct and apply rotation Pk-i) 

Pk-i^{pl-, + elY 

Cfe-i = Pk-\IPk~\ Sfc_i = Ok/pk^i 

6k = Sk-ipk Pk = Ck-iPk 

Pk^l = Ck-lPk-l + Sk^iPk Pk = -Sk~lPk~l + Ck-lPk 

10. (Update tk by forward substitution) 

Tk-l = (Cfe-1 - Ok~lTk-2) / Pk-l Tk = (Cfc - OkTk^l)/ Pk 

11. (Compute ||rfc||) 

dk = 4-1 +Pl l = dk + [Pk - Tkf + Pl+^ \\fk\\ = ^/7 

12. (Compute ||A^fj.||, \\xk\\, estimate ||A||, cond(74), and test for termination) 

\\A^fk\\ = |Cfc+i| (section [3:2t 
Compute ||a;fe|| (section 3.3) 

Estimate crmax(^fc): '''min(Sfc) and hence ||^||, cond(A) (section [s!!] ) 
Terminate if any of the stopping criteria are satisfied (section 3.6) 

6. Backward errors. For inconsistent problems with uncertainty in A (but not 
&), let x be any approximate solution. The normwise backward error for x measures 
the perturbation to A that would make x an exact LS solution: 

^(x) = minpll s.t. {A + Ef{A + E)x^{A + Efb. (6.1) 

E 

It is known to be the smallest singular value of a certain to x (n + to) matrix C; see 
Walden et al. [26] and Higham [lOl pp. 392-393]: 

p{x) = a„in(C), C=[a M(^/ - I^)" . 

Since it is generally too expensive to evaluate /i(a;), we need to find approximations. 
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6.1. Approximate backward errors Ei and E2. In 1975, Stewart 21 dis- 
cussed a particular backward error estimate that we will call Ei . Let x and r = b — Ax 
be the exact LS solution and residual. Stewart showed that an approximate solu- 
tion X with residual r = b — Ax is the exact LS solution of the perturbed problem 
min II & — (A + Ei)x\\, where Ei is the rank-one matrix 

6X^ lid 

with ||r||^ = ||r|p + ||e|p. Soon after, Stewart [22] gave a further important result 
that can be used within any LS solver. The approximate x and a certain vector 
f = b — (A + E2)x are the exact solution and residual of the perturbed LS problem 
min II & — {A + £'2)a;||, where 

^2--S^, ||i?2|| = ^^, r^b~Ax. (6.3) 

rll rll 

LSQR and LSMR both compute ||£'2|| for each iterate Xk because the current ||rfc|| 
and IjA-^rfcll can be accurately estimated at almost no cost. An added feature is that 
for both solvers, r — b — {A + E2)xk — because E2Xk = (assuming orthogonality 
of Vfc). That is, and rf. are theoretically exact for the perturbed LS problem 
{A + E2)xKb. 

Stopping rule S2 (section [3^ requires 11^^211 < ATOL||A||. Hence the following 
property gives LSMR an advantage over LSQR for stopping early. 

Theorem 6.1. \\E^^^^ < \\E^^^"\\. 

Proof. This follows from P^r^^^H < WA^r'f'^^W and ||?iSMR|| > ||rLSQR||. □ 

6.2. Approximate optimal backward error /i(a;). Various authors have de- 
rived expressions for a quantity Ji.{x) that has proved to be a very accurate approx- 



imation to /i(x) in (6.1 1 when x is at least moderately close to the exact solution x. 
Grcar, Saunders, and Su ^24|i8j show that /i(x) can be obtained from a full-rank LS 
problem as follows: 



uim\\Ky-vl Ji{x) ^ \\Ky\\l\\xl (6.4) 
y 

and give the following Matlab script for computing the "economy size" sparse QR 
factorization K — QR and c = Q'^v (for which ||c|| — \\Ky\\) and thence 'il{x): 



[m,n] = size(A) ; r = b - A*x; 

normx = norm(x) ; eta = norm(r)/normx; 

p = colaind(A) ; 

K = [A(:,p); eta*speye(n)] ; 
V = [ r ; zeros(n,l)]; 

[c,R] = qr(K,v,0); mutilde = norm ( c ) /normx ; 



K 



A 



In our experiments we use this script to compute fi{x}~) for each LSQR and LSMR 
iterate Xk- We refer to this as the optimal backward error for xu because it is provably 
very close to the true ^(xfc) [Zj. 
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6.3. Related work. More precise stopping rules have been derived recently by 
Arioli and Gratton [T] and Titley-Peloquin et al. |3J HSl US]- The rules allow for 
uncertainty in both A and b, and may prove to be useful for LSQR, LSMR, and 
least-squares methods in general. However, we would like to emphasize that rule S2 
already terminates LSMR significantly sooner than LSQR on most of our inconsistent 



test cases; see Theorem 6.1, Fig. |7.2| (left), and Fig. 7.3 (top left). 



7. Numerical results. For test examples, we have drawn from the University 
of Florida Sparse Matrix Collection (Davis [5]). We discuss overdetermined systems 
first, and then some square examples. 

7.1. Least-squares problems. The LPnetlib group provides data for 138 lin- 
ear programming problems of widely varying origin, structure, and size. The con- 
straint matrix and objective function may be used to define a sparse LS problem 
min — Each example was downloaded in Matlab format, and a sparse matrix 
A and dense vector b were extracted from the data structure via A = (Problem. A) ' 
and b = Problem. c (where ' denotes transpose). 

Five examples had 6 = 0, and a further six gave A^b — 0. The remaining 127 
problems had up to 243000 rows, 10000 columns, and 1.4M nonzeros in A. Diagonal 
scaling was applied to the columns of [A 5] to give a scaled problem min \\Ax — b\\ 
in which the columns of A (and also b) have unit 2-norm. LSQR and LSMR were run 
on each of the 127 scaled problems with stopping tolerance ATOL = 10~^, generating 
sequences of approximate solutions {a;);;^*^^ } and {a;^^'^^}- The iteration indices k are 
omitted below. The associated residual vectors are denoted by r without ambiguity, 
and X* is the solution to the LS problem, or the minimum-norm solution to the LS 
problem if the system is singular. 

As expected, the optimal residual is nonzero in all cases. We record some general 
observations. 

1. ||r'"^'^^ II is monotonic by design. ||7-LSMR|| gggj^^g i^q monotonic (no counter- 
examples were found) and nearly as small as ||r^^Q^|| for all iterations on 



almost all problems. Figure 7.1 shows a typical example and a rare case. 



Name:lp greenbeb, Dim:5598x2392, nnz:31070, id=631 



Name:[p woodw, Dim:8418x1 098, nnz:37487, id=702 




150 200 250 300 



20 30 



50 60 70 80 90 



Fig. 7.1. For most iterations, ||r^'5^^|| appears to be monotonic and nearly as small as 
||j-,LSQfl|j A typical case (problem lp_greenbeb). Right: A rare case (problem lp_woodw). 

LSMR's residual norm is significantly larger than LSQR's during early iterations. 
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Name:lp pilot ja, Dim:2267x940, nnz:14977, id=657 Name:lp sc205, Dim:31 7x205, nnz:665, id=665 




Fig. 7.2. For most iterations, ||i?2 II appears to be monotonia (but ||i?2 II *^ not). Left: 
A typical case (problem lp_pilot_ja) . LSMR is likely to terminate much sooner than LSMR (see 
Theorem \6. Right: Sole exception (problem IpscZOS) at iterations 54-67. The exception remains 
even if Uk and/or Vfc are reorthogonalized. 



\\x\\ is nearly monotonic for LSQR and even more closely monotonia for 
LSMR. With ||r|| monotonic for LSQR and essentially so for LSMR, ||£'i|| 



in (6.2 1 is likely to appear monotonic for both solvers. Although is not 

normally available for each iteration, it provides a benchmark for ||i?2||- 

WE^^'^^W is not monotonic, but appears monotonic almost always. 

Figure |7.2| shows a typical case. The sole exception for this observation is 
also shown. 

Note that Benbow [2] has given numerical results comparing a generalized 
form of LSQR with application of MINRES to the corresponding normal 
equation. The curves in [31 Figure 3] show the irregular and smooth behavior 
of LSQR and MINRES respectively in terms of Those curves are 

effectively a preview of the left-hand plots in Figure |7.2| (where LSMR serves 
as our more reliable implementation of MINRES). 

||^LSQR|| ^ ||£:LSQR|| often, but not so for LSMR. Some examples are shown 



on Figure 7.3 along with ii{xk), the accurate estimate (6.4) of the optimal 
backward error for each point x^. 

|^LSMR|| ^ -(^.LSMR) ^Imost always. Fi gure |7.4| shows a typical example 



and a rare case. In all such "rare" cases, w fj,{x^^^^') instead! 

~(^^i-SQR^ is not always monotonic. Jl{x^^^^) does seem to be monotonic. 
Figure [73] gives examples. 



^^•^.LSMR-j ^ ^^^LSQR-j almost always. Figure 7.6 gives examples. 

The errors ||a;* — and — x'^^'^^H seem to decrease monotonically, 

with the LSQR error typically smaller than for LSMR. Figure [TT] gives exam- 
ples. This is one property for which LSQR seems more desirable (and it has 
been suggested [18^ that for LS problems, LSQR could be terminated when 
rule S2 would terminate LSMR). 
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Name:lpcre a, Dim:7248x3516, nnz:18168, id=609 



Name:lp pilot, Dim;4860x1441, nnz:44375, id=654 




Fig. 7.3. ||£^2||) and fi{xk) for LSQR (top figures) and LSMR (bottom figures) . Top left: 

A typical case. \\E^^^^\\ is close to the optimal backward error, but the computable is 



not. Top right: A rare case in which is close to optimal. Bottom left: 



rLSMRw 



;<LSMRu 



are often both close to the optimal backward error. Bottom right: \\E^ 



LSMRu 



^1 



is far from 



optimal, but the computable ||£^2''^"'^^|| is almost always close (too close to distinguish in the plot!). 
Problems lp_cre_a (left) and Ip.pilot (right). 



Name:lpken 11, Dim:21 349x14694, nnz:49058, icl=638 Name:lp ship12l, Dim:5533x1151, nnz:16276, icl=688 




50 100 150 200 250 10 20 30 40 50 60 70 80 90 



Fig. 7.4. Again, \\E2^^^\\ ^ fi{x^^^^) almost always (the computable backward error esti- 
mate is essentially optimal). Left: A typical case (problem Ip-ken-ll) . Right: A rare case (problem 
lp_shipl2l). Here, ^ ^(^^LSMR-^, 
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Name:lp maros, Dim:1966x846, nnz:10137, id=642 



Name:lpcre c, Dim:641 1x3068, nnz:15977, id=611 



100 200 300 400 500 600 700 800 900 



200 400 



1000 1200 1400 



Fig. 7.5. Jl{x^S'^^) seems to be always monotonic, but jj,{x^^Q^) is usually not. Left: A 
typical case for both LSQR and LSMR (problem lp_maros). Right: A rare case for LSQR, typical 
for LSMR (problem Ip^cre^c). 



Name:lp pilot, Dim:4860x1441 , nnz:44375, id=654 



Nameilp standgub, Dim:1 383x361, nn2:3338, id=693 




100 200 300 400 500 000 



Fig. 7.6. fi{x^^^^) < fi{x^^Q^) almost always. Left: A typical case (problem lp_pilot). 
Right: A rare case (problem Ipstandgub) . 



Nameilp ship12l. Dim:5533x1151. nnz:16276, id=688 



Name:lp pds 02, Dim:7716x2953, nnz:16571 , id=649 



70 80 90 




60 70 



Fig. 7.7. The errors \\x* — x^^^^\\ and \\x* — x^^^^\\ seem to decrease monotonically, with 
LSQR's errors smaller than for LSMR. Left: A nonsingular LS system (problem lp_shipl2l). Right: 
A singular system (problem lp_pds_02). LSQR and LSMR both converge to the minimum-norm LS 
solution. 
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Name:Hamm.hcircuit, Dim;105676x105676, nnz:513072, id=542 



Name:Hamm.hcircuit, Dim:105676x105676, nnz:513072, id=542 




0000 12000 2000 4000 6000 



10000 12000 



Fig. 7.8. LSQR and LSMR solving two square nonsingular systems Ax = b: problems 
Hamm/hcircuit (top) and IBM_EDA/trans5 (bottom). Left: logjQ||rfc|| for both solvers, with pro- 
longed plateaus for LSMR. Right: logj^Q ||yl-^r'fcj| (preferable for LSMR). 



7.2. Square systems. Since LSQR and LSMR are applicable to consistent sys- 
tems, it is of interest to compare them on an unbiased test set. We used the search 
facility of Davis [5] to select a set of square real linear systems Ax = b. With 
index = UFget, the criteria 



ids = find (index. nrows > 100000 

index. nrows == index. ncols 
index. posdef == 



index. nrows < 200000 & ... 
index. isReal ==1 & ... 
index. nuinerical_symmetry < 1) ; 



returned a list of 42 examples. Testing isf ield (UFget (id) , 'b ' ) left 26 cases for 
which b was supplied. For each, diagonal scaling was first applied to the rows of 
[A fe] and then to its columns to give a scaled problem Ax = 6 in which the columns 
of ^A 6] have unit 2-norm. In spite of the scaling, most examples required more than 
n iterations of LSQR or LSMR to reduce ||rfc|| satisfactorily (rule SI in section 3.6 
with ATOL = BTOL = 10^^). To simulate better preconditioning, we chose two 
cases that required about n/5 and n/10 iterations. Figure 7.8 (left) shows both 
solvers reducing \\rk\\ monotonically but with plateaus that are prolonged for LSMR. 



With loose stopping tolerances, LSQR could terminate somewhat sooner. Figure [778| 
(right) shows ||yl-^rfc|| for each solver. The plateaus for LSMR correspond to LSQR 
gaining ground with ||rfc||, but falling significantly backward by the ||j4^r/j|| measure. 
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10 EO 30 40 50 60 70 80 90 500 1000 1500 EOOO 2500 3000 

Fig. 7.9. LSMR with and without reorthogonalization of Vk and/or U^. Left: An easy case 
where all options perform similarly (problem lp_shipl2l). Right: A helpful case (problem Ip.gran). 



7.3. Reorthogonalization. It is well known that Krylov-subspace methods can 
take arbitrarily many iterations because of loss of orthogonality. For the Golub-Kahan 
bidiagonalization, we have two sets of vectors Uk and Vk- As an experiment, we 
implemented the following options in LSMR: 

1. No reorthogonalization. 

2. Reorthogonalize Vk (that is, reorthogonalize with respect to V^^i). 

3. Reorthogonalize Uk (that is, reorthogonalize Uk with respect to Uk-i)- 

4. Both 2 and 3. 

Each option was tested on all of the over-determined test problems with fewer than 
16K nonzeros. Figure [7^ shows an "easy" case in which all options converge equally 
well (convergence before significant loss of orthogonality), and an extreme case in 
which reorthogonalization makes a large difference. 

Unexpectedly, options 2, 3, and 4 proved to be indistinguishable in all cases. To 
look closer, we forced LSMR to take n iterations. Option 2 (with Vk orthonormal 
to machine precision e) was found to be keeping Uk orthonormal to at least 0{^/e). 
Option 3 (with Uk orthonormal) was not quite as effective but it kept Vk orthonormal 
to at least 0{y/e) up to the point where LSMR would terminate when ATOL = y/e. 

Note that for square or rectangular A with exact arithmetic, LSMR is equivalent 
to MINRES on the normal equation (and hence to the conjugate-residual method [T^] 
and GMRES [20 on the same equation). Reorthogonalization makes the equivalence 
essentially true in practice. We now focus on reorthogonalizing Vk but not Uk- 

Other authors have presented numerical results involving reorthogonalization. For 
example, on some randomly generated LS problems of increasing condition number, 
Hayami et al. [3] compare their BA-GMRES method with an implementation of CGLS 
(equivalent to LSQR \W) in which Vk is reorthogonalized, and find that the methods 
require essentially the same number of iterations. The preconditioncr chosen for BA- 
GMRES made that method equivalent to GMRES on A^Ax = A^. Thus, GMRES 
without reorthogonalization was seen to converge essentially as well as CGLS or LSQR 
with reorthogonalization of Vk (option 2 above). This coincides with the analysis by 
Paige et al. [T3], who conclude that MGS-GMRES does not need reorthogonalization 
of the Arnoldi vectors Vk- 
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Name:lp maros, Dim;1966x846, nnz:10137, id=642 Name:lp ere c, Dim:641 1x3068, nnz:15977, id=61 1 




100 200 300 400 500 600 700 BOO 900 500 1000 1500 3000 2500 3000 3500 

iteration count iteration count 



Fig. 7.10. LSMR with reorthogonalized Vk and restarting. Restart(£) with £ = 5, 10,50 is slower 
than standard LSMR with or without reorthogonalization. Problems lp_maros and lp_cre_c. 




Fig. 7.11. LSMR with local reorthogonalization of V^. Local(l) with t = 5,10,50 illustrates 
reduced iterations as I increases. Problems Ip-fitlp and lp-bnl2. 



7.3.1. Restarting. To conserve storage, a simple approach is to restart the 



algorithm every i steps, as with GMRES(£) [20]. Figure 7.10 shows that restarting 
LSMR even with full reorthogonalization (of Vk) may lead to stagnation. In general, 
convergence with restarting is much slower than LSMR without reorthogonalization. 

7.3.2. Local reorthogonalization. Here we reorthogonalize each new Vk with 



respect to the previous I vectors, where I is a specified parameter. Figure [7H] shows 
that I = 5 has little effect, but partial speedup was achieved with I = 10 and 50 in the 
two chosen cases. There is evidence of a useful storage-time tradeoff. The potential 
speedup depends strongly on the computational cost of Av and A^u. 

7.3.3. Partial reorthogonalization. Larsen [19] uses partial reorthogonaliza- 
tion of both Vfc and C/fc within his PRO PACK software for computing a set of singular 
values and vectors for a sparse rectangular matrix A. Similar techniques might prove 
helpful within LSMR. We leave this for future research. 



8. Summary. We have presented LSMR, an iterative algorithm for square or 
rectangular systems, along with details of its implementation and experimental results 
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to suggest that it has advantages over the widely adopted LSQR algorithm. 

As in LSQR, theoretical and practical stopping criteria are provided for solving 
Ax = b and niin||j4a; — 6|| with optional Tikhonov regularization, using estimates of 
||rfc||, ll^'^rfcll, \\xk\\, \\A\\ and cond(A) that are cheaply computable. For LS problems, 
the Stewart backward error estimate ||i?2|| (6.3| seems experimentally to be very close 



to the optimal backward error ^{xk) at each LSMR iterate x^ (section 6.2 ). This often 
allows LSMR to terminate significantly sooner than LSQR. 

Experiments with full reorthogonalization have shown that the Golub-Kahan pro- 
cess retains high accuracy if the columns of either Vk or Uk are reorthogonalized. There 
is no need to reorthogonalize both. This discovery could be helpful for other uses of 
the Golub-Kahan process. 

Matlab, Python, and Fortran 90 implementations of LSMR are available from 
[TT] . They all allow local reorthogonalization of ■ 

Acknowledgements. We are grateful to Chris Paige for his helpful comments on 
reorthogonalization and other aspects of this work. We are also grateful to two referees 
for their extremely helpful and perceptive reviews. Further thanks go to Martin van 
Gijzen and Mike Botchev for their help with testing LSMR on square systems arising 
from convection-diffusion problems, to Sou-Cheng Choi for her helpful comments, and 
to Victor Pereyra for proposing that LSMR be used to terminate LSQR if a smaller 
final error ||x — x^W is important. 



Appendix A. Proof of Lemma |3.1[ 

The effects of the rotations Pk and Pk-i can be summarized as 



Rk 



(Pi k 



\ 



\ 



PkJ 



-Sk Ck) \ Q 

Cfc Sk\ f Pk-1 

-Sk Ck) \ Ok 



Pk 
Jk+ 

pk Pk 



Pk-i 




Ok 

Pk 



13k 



where /3i = pi = pi, /3i = /3i and where Ck, Sk are defined in section 
We define s^*^^ = si . . . Sk and s'-'^^ = si . . . Sk- Then from (3.31 and 
Rkik = Zk ^ {Ik 0) Qk+iCk+iPi. Expanding this and ( |3.1[ ) gives 



2.6 



2.4 1 we have 



Rj-ik 



Cl 
-SlC2 



/ 



/9i 



bk = 



Qk 



\^(_l)fe+l,-(/c-l)^J 

and we see that 

fi = Pi^CiPi 



Cl 
-SlC2 



(-l)'=+is('=-i)cfe 

y (_l)fc+2g(fc) J 



= p-ii((-l)'=s("-')cfe_i^i - 0k-lfk-2) 

rk = Pk\{-l)''+'s^''-^^CkPi - ~6kTk-i). 
Pk = ikPk + ~3k{-~lfs^'''^Ck+iPi. 



(A.l) 
(A.2) 
(A.3) 
(A.4) 
(A.5) 
(A.6) 
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We want to show by induction that fi — Pi for all i. When i = 1, 

Pi = CiCiPi - S1S1C2P1 = — (ClPl - O2S1C2) = — = — — = — Ci = Ti 

Pi Pi Pi Pi Pi Pi Pi 

where the third equality follows from the two lines below: 

Cipi - 6/2S1C2 = cipi - O2S1 ^ Pi - ^2Si — = — (pi Q2S1OL2) 

P2 P2 Pi P2 

Pi t'2.Sl«2 = Pi (SlP2)6'2 = Pi - —O2 = = = = ■ 

P2 P2 Pi Pi Pi 

Suppose ffc_i = /3fe-i. We consider the expression 

s^'~'^c,-p-,'cl-iPlPi = ''^{s^'-'^c.rc^iPkPi 

Pk 

_ 02 - ■ - Okai Pi - ■ ■ Pk-1 n - ^2 0k a 

= Ck z PkPl = Ck- ^ Pi 

Pl---Pk Pi- ■■ Pk-1 Pi Pk-1 

= CkSi---Sk-iPi = CfeS^^-^^A. (A.7) 

Applying the induction hypothesis on fk = p^^ l)''+^s('^^^'cfe/3i — OkTk-ij gives 
rk = Pk' ((-l)'^+'s('=-i)cfe/3i - Ok [dk-iPk-i+Sk-i{-l)''s^''-'^CkPi)) 

= pl^hck-lPk-l + ' (s^'^'^Cfc/?! - h~Sk-J''-^^CkPl) 

= Pk'{PkSk-i)ck-iPk-i + {-l)''^^Pk's^''''^'>(3i {pkCk-iCk - Ok+iSkCk+i) 
= CkSk-iPk-i + (-l)''+^s(''"^'/3i (ckCk-iCk - SkSkCk+i) 
= Ck (^-Sk-iPk-i + £k-i{-l)''+'s^''-^^CkPi) + Sfe(-l)'=+is('=)cfe+i/3i 
= £kPk + Sk{-l)''+^s^'''>Ck+iPi = Pk 
with the second equality obtained by the induction hypothesis, and the fourth from 

s'^'-'^CkPi - OkSk-is'^'^^'^CkPi = s^'^-'^CkPk'cl-iPlPi - {Sk-iPk)Sk-is^'"'^CkPi 

2_,2 2) 

Pk 

= S*^''"^'/?! (pkCk-lCk - Ok+lSkCk+l) , 



where the first equality follows from (A.7) and the last from 

-2 2 ~2 -2 _ I -2 n2 \ ~2 -2 _ -2 ( -i ~2 \ n2 _ -2-2 n2 
'^k-lPk ~ ^k-lPk — \Pk ^k+l) ^k-lPk — Pk\^ ^fc-li '^k+l — Pk^k-1 ^ ^k+li 

Ck -2-2 - -2 ■ - 

— Pk^k-l = PkCk-lCk = PkCk-lCk, 

Pk 

Cfe a2 ^k+1 a &k+lPk+l Ck ^ 

-^^k+i — ^ — ffc+iCfe — = Ska.k+1 — ffc+iSfeCfc+i. 

Pk Pk Pk Pk+1 



Therefore by induction, we know that fi = j3i for i — 1,2, ... . From (3.3), we see that 
at iteration k, the first fc — 1 elements of hk and tk are equal. □ 



LSMR: AN ITERATIVE ALGORITHM FOR LEAST SQUARES 



21 



REFERENCES 

M. Arioli and S. Gratton, Least-squares problems, normal equations, and stopping crite- 
ria for the conjugate gradient method, Technical Report RAL-TR-2008-008, Rutherford 
Appleton Laboratory, Oxfordshire, UK, 2008. 

S. J. Benbow, Solving generalized least-squares problems with LSQR, SIAM J. Matrix Anal. 
Appl., 21 (1999), pp. 166-177. 

X.-W. Chang, C. C. Paige, and D. Titley-Peloquin, Stopping criteria for the iterative 
solution of linear least squares problems, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 831- 
852. 

S.-C. Choi, C. C. Paige, and M. A. Saunders, MINRES-QLP: a Krylov subspace method for 

indefinite or singular symmetric systems, SIAM J. Sci. Comput., 33 (2011), pp. 00-00. 
T. A. Davis, University of Florida Sparse Matrix Collection. |http : //miw ■ else ■ uf 1 . eduT] 

research/sparse/matrices 
G. H. GOLUB AND W. Kahan, Calculating the singular values and pseudo-inverse of a matrix, 

J. of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2 

(1965), pp. 205-224. 

S. Gratton, P. Jiranek, and D. Titley-Peloquin, On the accuracy of the Karlson-Walden 
estimate of the backward error for linear least squares problems. Manuscript, Feb 22, 2011. 

J. F. GrCAR, M. a. Saunders, and Z. Su, Estimates of optimal backward perturbations for 
linear least squares problems. Report SOL 2007-1, Department of Management Science and 
Engineering, Stanford University, Stanford, CA, 2007. 21 pp. 

K. Hayami, J.-F. Yin, and T. Ito, GMRES methods for least squares problems, SIAM J. 
Matrix Anal. Appl., 31 (2010), pp. 2400-2430. 

N. J. HiGHAM, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, sec- 
ond ed., 2002. 

LSMR software for linear systems and least squares. |http: //www. Stanford. edu/group/SOL/| 
software. html 

D. G. Luenberger,, The conjugate residual method for constrained minimization problems, 

SIAM J. Numer. Anal., 7 (1970), pp. 390-398. 
P. ,7lRANEK AND D. Titley-Peloquin, Estimating the backward error in LSQR, SIAM J. 

Matrix Anal. Appl., 31 (2010), pp. 2055-2074. 
C. C. Paige, M. Rozloznik, and Z. Strakos, Modified Gram-Schmidt (MGS), least squares, 

and backward stability of MGS-GMRES, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 264- 

284. 

C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear equations, 
SIAM J. Numer. Anal., 12 (1975), pp. 617-629. 

, LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. 

Math. Softw., 8 (1982), pp. 43-71. 

Algorithm 583; LSQR: Sparse linear equations and least-squares problems, ACM Trans. 



Math. Softw., 8 (1982), pp. 195-209. 
V. Pereyra, 2010. Private communication. 

PROPACK software for SVD of sparse matrices. Ihttp : //soi . Stanford . edu/-rmunk/PROPACK/] 

Y. Saad and M. H. Schultz, GMRES: a generalized minimum residual algorithm for solving 

nonsymmetric linear systems, SIAM J. Sci. and Statist. Comput., 7 (1986), pp. 856-869. 
G. W. Stewart, An inverse perturbation theorem for the linear least squares problem, 

SIGNUM Newsletter, 10 (1975), pp. 39-40. 
, Research, development and LINPACK, in Mathematical Software III, J. R. Rice, ed., 

Academic Press, New York, 1977, pp. 1-14. 
, The QLP approximation to the singular value decomposition, SIAM J. Sci. Comput., 

20 (1999), pp. 1336-1348. 
Z. Su, Computational Methods for Least Squares Problems and Clinical Trials, PhD thesis, 

SCCM, Stanford University, 2005. 
D. Titley-Peloquin, Backward Perturbation Analysis of Least Squares Problems, PhD thesis, 

School of Computer Science, McGill University, 2010. 
B. Walden, R. Karlson, and J.-G. Sun, Optimal backward perturbation bounds for the linear 

least squares problem. Numerical Linear Algebra with Applications, 2 (1995), pp. 271-286. 



